Automated Legal Clause Extraction and Risk Scoring Using NLP and Generative AI

Authors: Dhaksha Charan R, Siddarth S, Jeevitha M

DOI Link: https://doi.org/10.22214/ijraset.2026.79369

Abstract

Legal documents such as contracts, agreements, and invoices are often complex and time-consuming to analyze manually. This paper presents an AI-powered Legal Document Analysis System (LDAS) that automates clause extraction, risk detection, and executive summary generation using Natural Language Processing (NLP) and cloud computing technologies. The system leverages AWS services including Lambda and S3 along with Generative AI (GenAI) models to process PDF and DOCX documents efficiently. It identifies ten standard legal clause types — Parties, Term, Payment Clause, Liability, Confidentiality, Termination, Governing Law, Intellectual Property, Warranties, and Force Majeure — and highlights missing or risky clauses against a configurable standard template. A weighted risk scoring algorithm quantifies deviations on a 0-100 scale. Experimental evaluation demonstrates 100% clause detection accuracy on tested documents, with a risk score of 87/100 correctly identifying 6 high-risk and 2 medium-risk missing clauses and zero false positive deviations. The proposed system improves efficiency, reduces manual errors, and enhances decision-making in legal workflows.

Introduction

The text describes the development of the Automated Legal Clause Extraction and Risk Scoring System (LDAS), an AI-powered cloud-based platform designed to automate legal document analysis and contract review. Legal document analysis is a complex and time-consuming task because contracts and agreements contain large amounts of linguistically complex text. Traditional manual review methods require significant effort from legal professionals, resulting in operational bottlenecks, inconsistencies, and scalability issues when handling large volumes of documents.

Earlier legal document processing systems used rule-based methods and machine learning algorithms such as Support Vector Machines and Random Forests, but these approaches required extensive labeled datasets and struggled with diverse legal language. The introduction of transformer-based models like BERT and Legal-BERT improved clause classification and semantic understanding. More recently, Large Language Models (LLMs) and Generative AI enabled zero-shot and few-shot learning, allowing legal clause extraction and summarization without the need for domain-specific training datasets. Prompt engineering and structured JSON outputs further improved automation and downstream processing.

The proposed LDAS system addresses several research gaps, including the absence of real-time web-based legal analysis tools, limited integration of serverless cloud architectures with Generative AI, lack of quantified risk-scoring systems, absence of multi-document comparison, and the shortage of fully deployable end-to-end legal AI platforms.

The system provides several core functionalities:

Automatic extraction of ten standard legal clauses such as confidentiality, liability, payment terms, intellectual property, warranties, and force majeure.
Detection of missing or risky clauses by comparing extracted clauses with predefined legal templates.
Risk score calculation using a weighted scoring formula to prioritize contracts requiring attention.
Executive summary generation to provide easy-to-understand overviews for non-legal users.
Multi-document comparison for detecting clause-level conflicts between contract versions.

The architecture of the system follows a three-tier cloud-native design:

Presentation Layer: A React.js web application hosted on Amazon S3 provides interfaces for single-document analysis and document comparison. Results are displayed using color-coded indicators for extracted clauses and risk levels.
Processing Layer: AWS Lambda serverless functions handle document upload, text extraction, AI-based clause extraction, deviation detection, and summary generation. The serverless architecture provides scalability without infrastructure management.
Storage Layer: Amazon S3 securely stores uploaded documents, extracted text, JSON analysis results, and executive summaries using encrypted storage.
AI Engine: A Generative AI model accessed through AWS Bedrock performs clause extraction and summary generation using structured zero-shot prompts and returns JSON-formatted outputs.

Conclusion

The Legal Document Analysis System (LDAS) demonstrates how Artificial Intelligence can transform legal workflows. By automating clause extraction, risk detection, deviation scoring, and executive summarization, the system reduces manual effort and improves consistency in legal document review. The integration of AWS serverless cloud technologies ensures elastic scalability and operational reliability, making the system suitable for real-world deployment across legal, corporate, and governmental institutions. Experimental evaluation demonstrated 100% clause detection accuracy on the primary test document with zero false positive deviations, 94.2% average accuracy on synthetic documents, and a risk scoring algorithm achieving r = 0.87 correlation with expert legal judgment. Multi-document comparison achieved 93.3% conflict detection accuracy. End-to-end processing completes in 45-90 seconds — a 60x improvement over equivalent manual review time. Future enhancements including OCR support, multilingual processing, configurable contract-type templates, and enterprise system integration will further extend the system\'s applicability across diverse legal domains and organizational contexts. LDAS demonstrates that Generative AI combined with cloud-native serverless architecture can meaningfully augment human legal professionals, enabling them to focus expertise on high-value substantive work.

References

[1] D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd ed. Pearson, 2020. [2] Amazon Web Services, AWS Lambda and S3 Documentation, [Online]. Available: https://aws.amazon.com, 2024. [3] OpenAI, Generative AI Models, [Online]. Available: https://openai.com, 2023. [4] S. Zhang, Y. Liu, and X. Chen, AI-Based Contract Analysis Using Deep Learning, IEEE J. Sel. Topics Signal Process., vol. 15, no. 4, pp. 234-245, 2021. [5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers, in Proc. NAACL-HLT, Minneapolis, USA, 2019, pp. 4171-4186. [6] I. Chalkidis et al., LEGAL-BERT: The Muppets Straight out of Law School, in Proc. EMNLP Findings, 2020, pp. 2898-2904. [7] I. Chalkidis et al., ContractNLI: A Dataset for Document-Level NLI for Contracts, in Proc. EMNLP Findings, 2021, pp. 593-606. [8] M. Sewak et al., Serverless Computing: Factors Influencing Microservice Performance, in Proc. IEEE Conf. Cloud Computing, San Francisco, 2018, pp. 1-7. [9] T. Wolf et al., HuggingFace\'s Transformers: State-of-the-Art NLP, in Proc. EMNLP: System Demonstrations, 2020, pp. 38-45. [10] C. Savelka et al., Improving Sentence Retrieval from Case Law, in Proc. ICAIL, Sao Paulo, Brazil, 2019, pp. 199-203. [11] N. Aletras et al., Predicting Judicial Decisions of the European Court of Human Rights, PeerJ Computer Science, vol. 2, e93, 2016. [12] K. D. Ashley and S. Bruninghaus, Automatically Classifying Case Texts and Predicting Outcomes, Artificial Intelligence and Law, vol. 17, no. 2, pp. 125-165, 2009.

Copyright

Copyright © 2026 Dhaksha Charan R, Siddarth S, Jeevitha M. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET79369

Publish Date : 2026-04-03

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here